Skip to content

Conversation

@doc-han
Copy link
Contributor

@doc-han doc-han commented Nov 10, 2025

Description

Implements active, inactive and unavailable states for active collaborators on a workflow.

active - a user is active when a workflow is open and the user is actively on the tab.
inactive - a user is inactive after 12s if they leave the tab where the workflow is open.
unavailable - a user is removed from active collaborators after approx. 1min of closing the tab where the workflow is open.

Main issue

[a] Throttling of browser timers is our biggest villain in this PR.
What problems does it cause?

  • Users are supposed to keep updating their awareness in ydoc by updating their state else they get removed from awareness. there's a 30secs window for this by ydoc else user is marked offline
  • We currently perform this update every 10secs to have 3 chances of updating the ydoc state.
  • Browser throttling reduced the 3 chances to be unpredictable(mostly 0 when the user leaves the tab).
  • Hence, when a user leaves the tab. there's a high chance they'll be removed from ydoc awareness because throttling would've increased our 10secs interval to 30+secs.
    Hence, when a user leaves a tab, they will be marked as offline when throttling kicks in, and when after being throttled, their event finally goes through, they then get marked back as online. making them flicker

[b] Another thing to be aware of 10secs interval
Every 10secs we actually update the user's lastSeen to Date.now() but then when the user leaves the tab, we don't update this lastSeen value. And because there's no state change/diff happening when the user leaves the tab. No event is actually triggered to keep ydoc live.

Approach in this PR

  1. When user leaves the tab. for every 10secs, we increment lastSeen by 1ms which will trigger an event to keep ydoc live. This is to battle [b]

  2. We have a small cache for activeCollaborators that lives for 60secs. when a user shows up as an active-collaborator we keep him for 60secs max. this 60secs is renewed when they show up again in the new set of active collaborators we receive. This resolves [a]

  3. and 2. work hand in hand to smoothen the active collaborators regardless of the ydoc and browser throttling limits.

Side Effect

When a user closes a tab or a workflow. They disappear from other users active-collaborators after 60secs. but users would know inactive collaborators real quick(12secs)

Closes #3931

Validation steps

Note: Some values got changed.
eg.

  1. inactivity time changed from 2min to 12secs.
  2. it also takes approx. 1min(60secs) to remove a user from the active collaborators when they actually leave.

Additional notes for the reviewer

  1. (Is there anything else the reviewer should know or look out for?)

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

Pre-submission checklist

  • I have performed a self-review of my code.
  • I have implemented and tested all related authorization policies. (e.g., :owner, :admin, :editor, :viewer)
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

@github-project-automation github-project-automation bot moved this to New Issues in v2 Nov 10, 2025
@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.84%. Comparing base (5e85e2f) to head (2483a1e).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3951   +/-   ##
=======================================
  Coverage   88.83%   88.84%           
=======================================
  Files         422      422           
  Lines       19118    19118           
=======================================
+ Hits        16984    16985    +1     
+ Misses       2134     2133    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@doc-han
Copy link
Contributor Author

doc-han commented Nov 17, 2025

suggestion from @stuartc, "use phoenix channel callback instead of timeouts to keep awareness active".

@doc-han doc-han force-pushed the 3931-inactive-collaborator-avatar-dissapears-on-the-editor-instead-of-greying-out branch from 561dffc to 43b4845 Compare November 17, 2025 12:34
@doc-han
Copy link
Contributor Author

doc-han commented Nov 17, 2025

update

  1. caching awareness users for 1min to make user availability smooth to others.
  2. reducing online idle time from 2min to 12sec. hence, if you leave the tab, you'll be set to inactive after 12secs instead of the initial 2mins.
  3. when users are active on the tab, they update their lastSeen every 30secs with their current time.
  4. when users leave the tab, we update their lastSeen with their previous lastSeen + 1ms. approx. nothing. just to update the store to keep awareness active.

@doc-han doc-han marked this pull request as ready for review November 18, 2025 08:18
@doc-han doc-han changed the title fix: inactive collaborators should still ping their last seen fix: flickering of active collaborator icons between states Nov 18, 2025
@doc-han doc-han requested review from elias-ba and stuartc November 18, 2025 09:15
Copy link
Contributor

@elias-ba elias-ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @doc-han! The store implementation and caching mechanism look great. However, I noticed the ActiveCollaborators.tsx component wasn't updated to use the new 12-second threshold. The component still uses lessthanmin(user.lastSeen, 2) on line 47, which is probably causing the JavaScript tests to fail.

@github-project-automation github-project-automation bot moved this from New Issues to In review in v2 Nov 20, 2025
Copy link
Member

@stuartc stuartc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about the core approach here. This implementation
conflates the Yjs awareness heartbeat (keeping the connection alive)
with user presence (activity/engagement). The result is that user
presence state is now driven by each browser's internal throttling
policies rather than actual user behavior. What's happening:

  • The frozen timestamp technique keeps awareness alive during throttling
  • But we're then using that same lastSeen timestamp to determine if
    users are "active" or "inactive"
  • This means presence state is an indirect side-effect of browser
    throttling, not a direct measurement of user engagement

The constraints this creates:

  1. Browser throttles → lastSeen updates slow → appears inactive
  2. Browser active → lastSeen updates normally → appears active

We're treating the browser's power-saving policies as a signal for user
presence. This varies by: Browser, device & OS, battery state, power
profile...

Extensibility problem:

Where would we add actual presence detection?

If we wanted to:

  • Check document.visibilityState directly
  • Track user interactions (mouse, keyboard)
  • Distinguish "tab switched for a few seconds" from "user left computer"

Would we keep extending lastSeen? That field is already serving two
purposes:

  • Keep Yjs awareness alive/changing (its original job)
  • Indicate user activity (what we're overloading it with)

We discussed the other day: Heartbeat and presence are separate
concerns. The heartbeat exists to prevent connection timeouts - it's
plumbing. User presence is a product feature about engagement/activity.
Mixing them means we can't adjust one without affecting the other.

Example: If we want to show users as inactive faster (say 5 seconds), we
can't - we're locked to the 10-30s window by Yjs's 30-second timeout. If
we want to keep users visible longer after they close a tab, we add
caching layers that delay all removals by 60+ seconds.
The 60-second removal time: Users disappear 60-70 seconds after closing
a tab because we need the cache to smooth over throttling-induced
flickering. But that's a workaround for using throttling as a presence
signal in the first place.

Request: What would it take to separate these? Could we: Use lastSeen
only for connection health (keep current 10s updates) Add a separate
lastActivity and/or lastState field to awareness (active, away, idle) +
timestamp

Update that field based on visibility API, not throttling side-effects
Let the UI read lastActivity and/or lastState instead of calculating
from lastSeen This would give us direct control over presence without
fighting browser throttling behavior. The frozen timestamp technique
might not be needed since it's value only needs to actually change in
order to keep the awareness around. But wouldn't drive the user-facing
presence indicator. I want to understand: what blocked this approach? Is
there a technical constraint I'm missing, or was the lastSeen
overloading the simpler path?

@doc-han
Copy link
Contributor Author

doc-han commented Nov 21, 2025

@stuartc I think I understand clearly what you mean.
So if I'm getting it right. we have to do.

  1. lastSeen - for keeping ydoc awareness alive
  2. lastState - active|away|idle to show the state of the user. based on window visibility API
  • with the separation of these. we stand the chance to be able to show user state in realtime. hence, 2. does a great job
  • with 1. lastSeen will then be updated at our normal 10s. (which might be throttled, causing a flicker).

don't we still need the caching to smoothen out when lastSeen gets throttled? In this case our lastState has been decoupled from it and will be realtime.

thinking...
to make it better, we only cache when lastState moves to away|idle. once out of those states, we clear from cache. I think that will smoothen it out without any effects.

@stuartc
Copy link
Member

stuartc commented Nov 21, 2025

@stuartc I think I understand clearly what you mean. So if I'm getting it right. we have to do.

  1. lastSeen - for keeping ydoc awareness alive
  2. lastState - active|away|idle to show the state of the user. based on window visibility API
  • with the separation of these. we stand the chance to be able to show user state in realtime. hence, 2. does a great job
  • with 1. lastSeen will then be updated at our normal 10s. (which might be throttled, causing a flicker).

don't we still need the caching to smoothen out when lastSeen gets throttled? In this case our lastState has been decoupled from it and will be realtime.

thinking... to make it better, we only cache when lastState moves to away|idle. once out of those states, we clear from cache. I think that will smoothen it out without any effects.

Yeah I agree, we still need the smoothing. But we can use logic like lastActivity > lastSeen i.e. we can derive that they aren't engaged but the window is still open from that.
We can use it to unify multiple tabs for the same user, and pick the one with newest lastActivity.

I think lastState (like an enum) shouldn't be set in the awareness, since we might lose the ability to change it in throttling situations. I see lastState as derived. But having lastMouseMove, lastKeyDown, lastUnfocus (when visibility changes); would all be timestamps that would be updated with a debounce. I'm not saying we want ALL off those right now, but I guess my point is, by having different values for each signal we can derive the exact state we want to show other users, down to the second if we wanted and the smoothing comes just before the presentation layer and the updating of the awareness object happens whenever we need it to without any coupling to how we show the state of a user.

@doc-han
Copy link
Contributor Author

doc-han commented Nov 21, 2025

@stuartc I've opened a new PR around this. separates the log for keeping ydoc alive and the one for updating the last activity. Works very fine so far. with smoothing also.

here #4053
currently handles two activity states. active|away.

@doc-han
Copy link
Contributor Author

doc-han commented Nov 21, 2025

For the consolidation of multiple tabs open by a user. I already have that in place but I've changed how the priority is picked in the other PR. lastActivity > lastSeen just as you mentioned.

@doc-han
Copy link
Contributor Author

doc-han commented Nov 21, 2025

with this said. on #4053

  1. user last activity is tracked realtime active | away.
  2. we keep our 10s lastSeen updates to ydoc.
  3. to battle possible throttling. when a users activity state changes to away, we cache that user for 60s, we only make use of this cache when we can't find the user in our awareness while cache TTL isn't elapsed.

This provides a better user experience than before.

thinking...
@stuartc does it make sense to put in a measure before activating this caching mechanism? eg. checking the time difference between our 10s ydoc keep-alive trigger. if we start noticing near 30s then we activate the caching mechanism.

What do you think about this and the whole caching stuff in general.

@doc-han
Copy link
Contributor Author

doc-han commented Nov 21, 2025

Closing this PR soon in favour of #4053

@doc-han doc-han marked this pull request as draft November 21, 2025 18:14
doc-han and others added 2 commits November 24, 2025 09:12
Implement 60-second cache in awareness store to stabilize collaborator
states and prevent icon flickering between active/inactive/unavailable
states during rapid awareness updates.

Key changes:
- Add userCache Map to AwarenessState with 60s TTL
- Cache users as they disconnect from awareness, showing them as inactive
- Clean up expired cache entries every 30 seconds
- Update ActiveCollaborators to use cached users for stability
- Fix inactive collaborators to continue pinging their last seen timestamp
- Add comprehensive tests for cache behavior and timing

Technical details:
- userCache stores { user, cachedAt } tuples indexed by userId
- When user disconnects from awareness (no longer in cursorsMap), they
  remain in the users array via cache for 60s before being removed
- Page visibility API integration ensures lastSeen updates continue
  even when page is hidden (prevents premature cache expiration)
- Periodic cleanup prevents memory leaks from stale cache entries

Fixes #3931

Co-authored-by: Farhan Yahaya <yahyafarhan48@gmail.com>
Implement a new unified useAwareness() hook that consolidates
useRemoteUsers(), useLiveRemoteUsers(), and useUserCursors() into
a single flexible API.

Key changes:
- Remove excludeLocal option - hook always excludes local user
- Support cached mode for smooth avatar transitions (60s TTL)
- Support map format for efficient Monaco cursor CSS generation
- Add comprehensive test coverage (26 tests)

API:
- useAwareness() - live remote users (for cursors)
- useAwareness({ cached: true }) - cached remote users (for avatars)
- useAwareness({ format: 'map' }) - Map format (for Monaco CSS)

Legacy hooks remain available but are marked as deprecated.

Migrate components to unified useAwareness() API

Migrate all collaborative editor components to use the new
simplified useAwareness() hook and update test mocks.

Component migrations:
- ActiveCollaborators: useRemoteUsers() → useAwareness({ cached: true })
- Cursors: useUserCursors() → useAwareness({ format: 'map' })
- CollaborationWidget: useAwarenessUsers() → useAwareness({ cached: true })
- RemoteCursor: useRemoteUsers() → useAwareness()

Test updates:
- Update mocks to use useAwareness instead of legacy hooks

UX improvements:
- CollaborationWidget now shows "You + N others" for clarity
- RemoteCursor bug fix: cursors now disappear immediately on disconnect

Optimize remote cursor rendering to prevent jumpy movement

Fixes browser-specific issue where remote cursors appeared jumpy when
viewport updates were frequent (during panning/zooming). The problem
was render storms interrupting CSS transitions.

Changes:
- Add position rounding to reduce micro-pixel updates
- Replace left/top positioning with CSS transform for better GPU acceleration
- Wrap RemoteCursor in React.memo with custom comparison to skip re-renders
  when position changes are <1px

This prevents frequent viewport updates from recreating cursor elements
and interrupting their CSS transitions.
@stuartc stuartc force-pushed the 3931-inactive-collaborator-avatar-dissapears-on-the-editor-instead-of-greying-out branch from d61143f to 2483a1e Compare November 24, 2025 07:29
@stuartc stuartc marked this pull request as ready for review November 24, 2025 07:35
@stuartc stuartc merged commit 115ca2f into main Nov 24, 2025
8 checks passed
@stuartc stuartc deleted the 3931-inactive-collaborator-avatar-dissapears-on-the-editor-instead-of-greying-out branch November 24, 2025 07:35
@github-project-automation github-project-automation bot moved this from In review to Done in v2 Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Inactive Collaborator Avatar dissapears on the Editor instead of greying out

4 participants